A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model
نویسندگان
چکیده
Motivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported. Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Our fine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the `1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed. Availability: The implementation of our algorithm is available at https://github.com/ keio-bioinformatics/mxfold. Contact: [email protected]
منابع مشابه
MMKnots: A max-margin model for RNA secondary structure prediction including pseudoknots
Motivation: The ideal algorithm for the prediction of pseudoknotted RNA secondary structures will provide fast and accurate predictions for pseudoknots of arbitrary complexity. However, existing algorithms are typically lacking on one of these three axes. Energy-based methods suffer from the intractability of pseudoknotted structure prediction under realistic energy models, while statistical ap...
متن کاملPreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars
Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...
متن کاملRelation Between RNA Sequences, Structures, and Shapes via Variation Networks
Background: RNA plays key role in many aspects of biological processes and its tertiary structure is critical for its biological function. RNA secondary structure represents various significant portions of RNA tertiary structure. Since the biological function of RNA is concluded indirectly from its primary structure, it would be important to analyze the relations between the RNA sequences and t...
متن کاملCONTRAfold: RNA secondary structure prediction without physics-based models
MOTIVATION For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic paramete...
متن کاملA Fugacity Approach for Prediction of Phase Equilibria of Methane Clathrate Hydrate in Structure H
In this communication, a thermodynamic model is presented to predict the dissociation conditions of structure H (sH) clathrate hydrates with methane as help gas. This approach is an extension of the Klauda and Sandler fugacity model (2000) for prediction of phase boundaries of sI and sII clathrate hydrates. The phase behavior of the water and hydrocarbon system is modeled using the Peng-Robinso...
متن کامل